An empirical look at candidates’ age.
library(tidyverse)
library(here)
library(extrafont)
loadfonts(device = "win", quiet = T)
library(hrbrthemes)
hrbrthemes::update_geom_font_defaults(
family = "Roboto Condensed",
size = 3.5,
color = "grey50"
)
library(scales)
library(knitr)
library(paletteer)
library(ggtext)
library(glue)
library(pdftools)
library(svglite)
library(tictoc)
library(tidytext)
library(gt)
library(reactable)
library(ggforce)
library(ggiraph)
library(htmltools)
knit_hooks$set(wrap = function(before, options, envir) {
if (before) {
paste0("<", options$wrap, ">")
} else {
paste0("</", options$wrap, ">")
}
})
knitr::opts_chunk$set(
fig.align = "left",
message = FALSE,
warning = FALSE,
dev = "svglite",
# dev.args = list(type = "CairoPNG"),
dpi = 300,
out.width = "100%"
)
options(width = 180, dplyr.width = 150)
plot_bg_color <- readr::read_file(file=here::here("theme.css")) %>%
str_extract(., regex("(?<=blog-bg-color:).*?(?=;)")) %>%
str_trim() %>%
str_extract(., regex("^#\\S+"))
caption_table <- "Source:\ndata: https://www.wien.gv.at/politik/wahlen/grbv/2020/ analysis: Roland Schmidt | @zoowalk | https://werk.statt.codes"
fn_reactable_headings <- function(header, subtitle, table, caption) {
div(class="reactable-table",
div(
class="reactable-title",
header
),
div(
class="reactable-subtitle",
subtitle
),
table,
div(class="reactable-caption",
caption
)
)
}
theme_post <- function() {
hrbrthemes::theme_ipsum_rc() +
theme(
plot.background = element_rect(fill = plot_bg_color, color=NA),
panel.background = element_rect(fill = plot_bg_color, color=NA),
#panel.border = element_rect(colour = plot_bg_color, fill=NA),
#plot.border = element_rect(colour = plot_bg_color, fill=NA),
plot.margin = margin(l = 0,
t = 0.25,
unit = "cm"),
plot.title = element_markdown(
color = "grey20",
face = "bold",
margin = margin(l = 0, unit = "cm"),
size = 11
),
plot.title.position = "plot",
plot.subtitle = element_text(
color = "grey50",
margin = margin(t = 0.2, b = 0.3, unit = "cm"),
size = 10
),
plot.caption = element_text(
color = "grey50",
size = 8,
hjust = c(0)
),
plot.caption.position = "panel",
axis.title.x = element_text(
angle = 0,
color = "grey50",
hjust = 1
),
axis.text.x = element_text(
size = 9,
color = "grey50"
),
axis.title.y = element_blank(),
axis.text.y = element_text(
size = 9,
color = "grey50"
),
panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.y = element_blank(),
panel.spacing = unit(0.25, "cm"),
panel.spacing.y = unit(0.25, "cm"),
strip.text = element_text(
angle = 0,
size = 9,
vjust = 1,
face = "bold"
),
legend.title = element_text(
color = "grey30",
face = "bold",
vjust = 1,
size = 7
),
legend.text = element_text(
size = 7,
color = "grey30"
),
legend.justification = "left",
legend.box = "horizontal", # arrangement of multiple legends
legend.direction = "vertical",
legend.margin = margin(l = 0, t = 0, unit = "cm"),
legend.spacing.y = unit(0.07, units = "cm"),
legend.text.align = 0,
legend.box.just = "top",
legend.key.height = unit(0.2, "line"),
legend.key.width = unit(0.5, "line"),
text = element_text(size = 5)
)
}
data_date <- format(Sys.Date(), "%d %b %Y")
my_caption <- glue::glue("data: https://www.wien.gv.at/politik/wahlen/grbv/2020/\nanalysis: Roland Schmidt | @zoowalk | https://werk.statt.codes")
Elections in Vienna are today and while glancing through the electoral lists I couldn’t help but paying attention to candidates’ birth years. Maybe that’s an age thing… This got me thinking that I haven’t seen any more systematic analysis of parties’/candidates’ age profile. So as a modest contribution to this end, here are my two cents. Again, I’ll focus mainly on the pertaining steps in R and related number crunching. Due to a lack of time and not being an expert on Vienna’s electoral system, I’ll be brief when it comes to substantive matters. But the presented results hopefully provide sufficient material to dig into.
As always, if you see any glaring error or have any constructive comment, feel free to let me now (best via twitter DM).
Again, as so often, the trickiest part is to get the data ‘liberated’ from the format it is provided in. The entire list of candidates is published in this pdf. Note that there are three lists: One for the city council (‘Gemeinderat’; composed on the basis of the results in 18 multi-member electoral districts), one for the 23 district councils (‘Bezirksrat’; one in each district), and the ‘city election proposal’ (‘Stadtwahlvorschlag’; admittedly a somewhat clumsy translation). The latter doesn’t constitute a body in itself, but serves to allocate mandates which remained unassigned after counting the votes for the city council (‘zweites Ermittlungsverfahren’/similar to the d’Hondt procedure).
When it comes to extracting the data from the linked pdf, a difficulty may arise due to the two-column format of the document. Hence, simple row-wise extraction doesn’t help much since it would put candidates together which could be from different parties. Similarly, simply isolating the two columns and extracting candidates would also not do the trick since breaks betwee parties, districts etc run over two, and not one column. To illustrate this, I resorted to cutting edge technology and drew the two arrows below:

Luckily, the tabulizer package is not only very powerful when it comes to extracting text/data from a pdf, it is also sophisticated enough to take into consideration the text flow highlighted above. I am not familiar with the underlying heuristic, but I assume it is contingent on consistently formatted section headings. Hence, empowered with this tool, retrieving the text becomes rather effortless. The subsequent steps are a battery of regular expressions to extract the specific data we are interested in. To see the code, unfold the snippets below.
df_clean <- df_raw %>%
ungroup() %>%
mutate(text_split=str_split(text_raw, regex("\r\n\\s*(?=\\d+\\.)"))) %>%
unnest_longer(text_split) %>%
mutate(text_split=text_split %>% str_squish() %>% str_trim()) %>%
mutate(text_split=str_split(text_split, ".(?=Zustellungsbevollmächtigte(r)? Vertreter(in)?)")) %>%
unnest_longer(text_split)
# get Listenplatz ------------------------------------------------------------
df_clean <- df_clean %>%
mutate(listenplatz=str_extract(text_split, regex("^\\d+\\.?\\s+(?!Bezirk)")) %>%
str_extract(., "\\d*") %>% as.numeric())
# get elections -----------------------------------------------------------
df_clean <- df_clean %>%
mutate(election=text_raw %>% str_extract(., regex("(?<=[A-Z]\\.)\\s*[A-z]+wahl(en)?", dotall = T)) %>%
str_trim(., side=c("both"))) %>%
tidyr::fill(election, .direction="down")
# electoral district --------------------------------------------------------------
df_clean <- df_clean %>%
mutate(wahlkreis=case_when(election=="Bezirksvertretungswahlen" ~ str_extract(text_split, "\\d{1,2}\\. Bezirk"),
election=="Gemeinderatswahl"~ str_extract(text_split, regex("Wahlkreis.*?(?=[:upper:]{2,}?)",
dotall = T,
multiline = T)),
election=="Stadtwahl" ~ as.character("Stadtwahl"),
TRUE ~ as.character("missing"))) %>%
mutate(wahlkreis=str_trim(wahlkreis, side=c("both"))) %>%
tidyr::fill(wahlkreis, .direction="down") %>%
mutate(wahlkreis=str_remove(wahlkreis, "Wahlkreis ") %>%
str_remove(., regex("\\(.*\\)")) %>%
str_trim(., side=c("both")))
# other -------------------------------------------------------------------
df_clean <- df_clean %>%
mutate(page=text_raw %>% str_extract(., regex("Seite \\d+")) %>% str_extract(., "\\d+") %>%
as.numeric()) %>%
mutate(name=str_extract(text_split, regex("(?<=\\d\\.\\s?).*?(?=,\\s?\\d{4},)")) %>%
str_trim(., side=c("both"))) %>%
mutate(first_name=text_split %>% str_extract(., regex("[:alpha:]*(?=,\\s?\\d+)"))) %>%
mutate(year_birth=text_split %>% str_extract(., regex("\\d{4}")) %>%
as.numeric()) %>%
mutate(year_interval=cut(year_birth, seq(1930, 2005, 5))) %>%
mutate(plz=text_split %>% str_extract(., regex("\\d{4}\\s(?=Wien)")) %>%
str_trim(., side=c("both")))
# get party ---------------------------------------------------------------
df_clean <- df_clean %>%
mutate(party=text_split %>% str_extract(., regex("(?<=^Zustellung)[:alpha:]*$"))) %>%
mutate(party=case_when(lead(listenplatz==1) ~ str_extract(text_split, regex("\\w+$")),
TRUE ~ NA_character_)) %>%
tidyr::fill(party, .direction = "down") %>%
mutate(party=party %>%
as_factor() %>%
fct_relevel(., sort) %>%
fct_relevel(., "SPÖ", "FPÖ", "GRÜNE", "ÖVP", "NEOS"))
# wrap up -----------------------------------------------------------------
df_clean <- df_clean %>%
mutate(wahlkreis_plz=str_extract(wahlkreis, regex("\\d+")) %>%
as.numeric()+100) %>%
mutate(wahlkreis_plz=wahlkreis_plz %>% as.character() %>% paste0(., "0")) %>%
mutate(wahlkreis_plz=case_when(str_detect(wahlkreis, "Zentrum") ~ "1010, 1040, 1050, 1060",
str_detect(wahlkreis, "Innen") ~ "1070, 1080, 1090",
str_detect(wahlkreis, "Leopoldstadt") ~ "1020",
str_detect(wahlkreis, "Landstraße") ~ "1030",
str_detect(wahlkreis, "Favoriten") ~ "1100",
str_detect(wahlkreis, "Simmering") ~ "1110",
str_detect(wahlkreis, "Meidling") ~ "1120",
str_detect(wahlkreis, "Hietzing") ~ "1130",
str_detect(wahlkreis, "Penzing") ~ "1140",
str_detect(wahlkreis, "Rudolf") ~ "1150",
str_detect(wahlkreis, "Ottakring") ~ "1160",
str_detect(wahlkreis, "Hernals") ~ "1170",
str_detect(wahlkreis, "Währing") ~ "1180",
str_detect(wahlkreis, "Döbling") ~ "1190",
str_detect(wahlkreis, "Brigittenau") ~ "1200",
str_detect(wahlkreis, "Floridsdorf") ~ "1210",
str_detect(wahlkreis, "Donaustadt") ~ "1220",
str_detect(wahlkreis, "Liesing") ~ "1230",
TRUE ~ as.character(wahlkreis_plz))) %>%
mutate(residence=case_when(
str_detect(wahlkreis_plz, plz) ~ "inside",
!str_detect(wahlkreis_plz, plz) ~ "outside",
TRUE ~ as.character("missing"))) #%>%
df_clean <- df_clean %>%
select(-text_raw) %>%
filter(!is.na(listenplatz))
After these few steps we have a searchable/sortable table of all candidates (or better candidatures since one person can be candidate on multiple districts/lists). There have been 8,983 candidatures by 5,038 individuals.
The table essentially provides all necessary data for the subsequent analysis. While the pdf does not include the exact birth date of each candidate, it provides us with their birth years which we can take as a proxy for age. Note that I also extracted candidates’ residence zip code to see how often place of residence and candidature actually overlap (see below).
tb_all <- reactable(df_clean %>%
select(election, wahlkreis, party, name, listenplatz, year_birth, plz),
columns=list(election=colDef(name="Wahl", width=130),
wahlkreis=colDef(name="Wahlbezirk", width=100),
party=colDef(name="Partei", width=50),
name=colDef(name="KandidatIn"),
listenplatz=colDef(name="Listenplatz",
width=70,
align="center"),
year_birth=colDef(name="Geburtsjahr", width=90),
plz=colDef(name="PLZ Wohnort", width=90)),
bordered=F,
compact = TRUE,
highlight = TRUE,
style = list(fontSize = "10px"),
filterable = TRUE,
defaultPageSize = 23,
theme = reactableTheme(
borderColor = "#7f7f7f",
borderWidth = 1,
backgroundColor = "#f0eff0",
filterInputStyle = list(
color="green",
backgroundColor = plot_bg_color)))
html_tb_all <- div(class="reactable-table",
div(
div(
class="reactable-title",
"WIEN-WAHL 2020: Liste aller KandidatInnen"
),
),
tb_all,
div(class="reactable-caption",
caption_table)
)
Let’s now look at the overall youngest and oldest candidates. We could retrieve this information already from the main table provided above (sort column birth year). Here, however, let’s nest candidates’ different candidatures for the sake of clarity.
df_main <- df_clean %>%
distinct(name, year_birth, party) %>%
slice_max(.,order_by=year_birth, n=10) %>%
arrange(name, desc(year_birth)) %>%
mutate(index=dplyr::min_rank(year_birth)) %>%
mutate(index_name=paste(index, ". ",name), .before=1) %>%
select(-index)
tb_young <- reactable(df_main,
columns=list(index_name=colDef(name="KandidatIn",
width=130),
name=colDef(show=F),
year_birth=colDef(name="Geburtsjahr",
width=70),
party=colDef(name="Partei",
width=50)),
pagination = FALSE,
onClick = "expand",
bordered=F,
compact = TRUE,
highlight = TRUE,
rowStyle = list(cursor = "pointer"),
style = list(fontSize = "10px"),
theme = reactableTheme(
borderWidth = 1,
borderColor = "#7f7f7f",
backgroundColor = plot_bg_color,
filterInputStyle = list(
color="green",
backgroundColor = plot_bg_color)),
details=function(index){
df_nested <- df_clean %>%
slice_max(.,order_by=year_birth, n=10) %>%
select(name, election, wahlkreis, listenplatz) %>%
filter(name==df_main$name[index]) %>%
select(-name)
tbl_nested <- reactable(df_nested,
columns = list(
election=colDef(name="Wahl"),
wahlkreis=colDef(name="Wahlbezirk"),
listenplatz=colDef(name="Listenplatz")
),
outlined = TRUE,
highlight = TRUE,
fullWidth = TRUE,
theme = reactableTheme(
backgroundColor = "#ab8cab"))
htmltools::div(style = list(margin = "12px 45px"), tbl_nested)}
)
# html_tb_young <- div(
# div(
# div(
# class="reactable-title",
# "WIEN-WAHL 2020: Jüngesten KandidatInnen"
# ),
# div(
# class="reactable-subtitle",
# "Geburtsjahr lt. Wahlvorschlag als Basis. Top 10."
# ),
# ),
# tb_young_old)
html_tb_young <- fn_reactable_headings(header="WIEN-WAHL 2020: Jüngesten KandidatInnen",
subtitle="Geburtsjahr lt. Wahlvorschlag als Basis. Top 10.",
table=tb_young,
caption=caption_table)
As becomes clear from the table, there are overall 16 candidates who were all born in 2002.
df_main <- df_clean %>%
distinct(name, year_birth, party) %>%
slice_min(., order_by=year_birth, n=10) %>%
arrange(year_birth, name) %>%
mutate(index=dplyr::min_rank(year_birth)) %>%
mutate(index_name=paste(index, ". ", name), .before=1) %>%
select(-index)
tb_old <- reactable(df_main,
columns=list(index_name=colDef(name="KandidatIn",
width=130),
name=colDef(show=F),
year_birth=colDef(name="Geburtsjahr",
width=70),
party=colDef(name="Partei",
width=50)),
pagination = FALSE,
onClick = "expand",
bordered=F,
compact = TRUE,
highlight = TRUE,
rowStyle = list(cursor = "pointer"),
style = list(fontSize = "10px"),
theme = reactableTheme(
borderWidth = 1,
borderColor = "#7f7f7f",
backgroundColor = plot_bg_color,
filterInputStyle = list(
color="green",
backgroundColor = plot_bg_color)),
details=function(index){
df_nested <- df_clean %>%
slice_min(.,order_by=year_birth, n=10) %>%
select(name, election, wahlkreis, listenplatz) %>%
filter(name==df_main$name[index]) %>%
select(-name)
tbl_nested <- reactable(df_nested,
columns = list(
election=colDef(name="Wahl"),
wahlkreis=colDef(name="Wahlbezirk"),
listenplatz=colDef(name="Listenplatz")
),
outlined = TRUE,
highlight = TRUE,
fullWidth = T,
theme = reactableTheme(
backgroundColor = "#ab8cab"))
htmltools::div(style = list(margin = "12px 45px"), tbl_nested)}
)
html_tb_old <- fn_reactable_headings(header="WIEN-WAHL 2020: Ältesten KandidatInnen",
subtitle = "Geburtsjahr lt. Wahlvorschlag als Basis. Top 10.",
table=tb_old,
caption=caption_table)
The oldest candidate is Waschiczek Wolfgang, who was born in 1928. Not bad.
Let’s now look at the average year of birth of parties’ candidates on each of the different electoral levels. The table below provides the median, mean and standard deviation for each party. The thin white line in the density plots on the right indicates the median.
# summarize data (median, mean, sd)
df_list_age <- df_clean %>%
group_by(election, party) %>%
summarize(year_median=median(year_birth, na.rm = T),
year_mean=mean(year_birth, na.rm=T),
year_sd=sd(year_birth, na.rm=T)) %>%
group_by(election) %>%
arrange(desc(year_median), .by_group=T) #order as gt table
#create graphs for table
## define function creating plot
fn_plot <- function(data){
data %>%
ggplot()+
ggridges::geom_density_ridges(aes(x=year_birth,
y=0),
fill="firebrick",
color=plot_bg_color,
quantile_lines=T,
quantiles=2,
panel_scaling = F,
size=12)+
scale_x_continuous(limits=c(min(df_clean$year_birth),
max(df_clean$year_birth)),
expand=expansion(mult=0))+
scale_y_discrete(expand=expansion(mult=0))+
theme(
plot.background = element_rect(fill = plot_bg_color, color=NA),
panel.background = element_rect(fill = plot_bg_color, color=NA),
plot.margin = margin(0, unit="cm"),
axis.text = element_blank(),
axis.title = element_blank()
)
}
## apply function, dataframe with plots
box_plot <- df_clean %>%
select(election, party, year_birth) %>%
group_by(election, party) %>%
mutate(year_median=median(year_birth, na.rm = T)) %>%
#ungroup() %>%
nest(year_birth_nest=c(year_birth)) %>%
mutate(plot=map(year_birth_nest, fn_plot)) %>%
group_by(election) %>%
arrange(desc(year_median), .by_group=T) #order as gt table
#create gt table & insert df with plots;
tb_list_age <- df_list_age %>%
mutate(year_mean=round(year_mean, digits = 2)) %>%
select(election, party, contains("year")) %>%
group_by(election) %>%
arrange(desc(year_median), .by_group=T) %>% #order as plots
mutate(index=min_rank(-year_median), .before=1) %>%
mutate(index_party=paste0(index,". ", party)) %>%
ungroup() %>%
select(-index, -party) %>%
mutate(boxplot=NA) %>%
gt(groupname_col = "election", rowname_col = "index_party") %>%
tab_header(title=md("**WIEN-WAHLEN 2020:<br>Durchschnittliches Geburtsjahr der KandidatInnen**")) %>%
gt::cols_label(index_party="Partei",
year_median="Median",
year_mean="arith. Mittel",
year_sd="Std. Abw.",
boxplot="Verteilung") %>%
gt::fmt_number(columns=vars(year_sd),
decimals=2,
suffixing=F) %>%
gt::text_transform(
locations=cells_body(vars(boxplot)),
fn=function(x){
map(box_plot$plot, ggplot_image, height=px(11), aspect_ratio=4)}
) %>%
cols_width(
vars(boxplot) ~ px(200),
vars(index_party) ~ px(100)
) %>%
tab_options(heading.align = "left",
table.background.color = plot_bg_color,
table.width=pct(100),
table.font.size = "11px",
row_group.font.weight = "bold",
data_row.padding = px(0),
table.align = "left") %>%
tab_footnote(
footnote = "Vertical white line indicates median.",
locations = cells_column_labels(
columns = vars(boxplot))
) %>%
tab_source_note(
source_note = my_caption)
# gtsave(tb_list_age, "tb_list_age.png", path = here::here("_blog_data"))
| WIEN-WAHLEN 2020: Durchschnittliches Geburtsjahr der KandidatInnen |
||||
|---|---|---|---|---|
| Median | arith. Mittel | Std. Abw. | Verteilung1 | |
| Gemeinderat | ||||
| 1. VOLT | 1991.5 | 1992.12 | 2.85 | |
| 2. SÖZ | 1988.0 | 1985.63 | 10.57 | |
| 3. BIER | 1986.0 | 1984.50 | 5.83 | |
| 4. LINKS | 1983.0 | 1978.02 | 16.36 | |
| 5. WIFF | 1981.0 | 1975.00 | 26.75 | |
| 6. NEOS | 1977.0 | 1977.59 | 13.20 | |
| 7. SPÖ | 1976.0 | 1975.66 | 12.42 | |
| 8. ÖVP | 1975.0 | 1975.26 | 16.06 | |
| 9. GRÜNE | 1973.0 | 1973.43 | 12.83 | |
| 10. FPÖ | 1970.0 | 1971.81 | 14.96 | |
| 11. HC | 1968.0 | 1970.49 | 12.92 | |
| 12. PRO | 1959.0 | 1963.27 | 13.30 | |
| Bezirksvertretungen | ||||
| 1. VOLT | 1992.0 | 1992.27 | 4.18 | |
| 2. WANDL | 1990.0 | 1986.62 | 12.73 | |
| 3. KURZ | 1989.0 | 1987.71 | 7.90 | |
| 4. SÖZ | 1988.0 | 1985.35 | 11.17 | |
| 5. BIER | 1986.0 | 1986.77 | 3.26 | |
| 6. LINKS | 1985.0 | 1979.47 | 15.52 | |
| 7. NEOS | 1980.0 | 1979.68 | 14.28 | |
| 8. GRÜNE | 1973.0 | 1973.55 | 13.48 | |
| 9. SPÖ | 1972.0 | 1972.73 | 15.30 | |
| 9. ÖVP | 1972.0 | 1972.44 | 18.37 | |
| 11. PdA | 1969.0 | 1967.67 | 13.05 | |
| 12. WIR | 1968.5 | 1969.25 | 5.95 | |
| 13. FPÖ | 1968.0 | 1969.75 | 16.46 | |
| 13. HC | 1968.0 | 1970.16 | 12.91 | |
| 13. VOLK | 1968.0 | 1970.67 | 15.18 | |
| 16. PRO | 1964.5 | 1964.33 | 13.21 | |
| 17. WIEN | 1960.0 | 1958.67 | 7.89 | |
| 18. WIFF | 1958.0 | 1961.77 | 20.93 | |
| 19. PH | 1955.0 | 1957.40 | 5.94 | |
| Stadt | ||||
| 1. VOLT | 1991.5 | 1992.12 | 2.95 | |
| 2. LINKS | 1987.0 | 1981.74 | 13.88 | |
| 3. BIER | 1986.0 | 1984.70 | 6.52 | |
| 4. SÖZ | 1984.5 | 1983.65 | 11.38 | |
| 5. ÖVP | 1977.0 | 1976.58 | 15.48 | |
| 6. NEOS | 1976.5 | 1977.02 | 13.89 | |
| 7. SPÖ | 1976.0 | 1975.51 | 13.18 | |
| 8. GRÜNE | 1973.0 | 1973.79 | 12.25 | |
| 9. PRO | 1970.0 | 1966.60 | 9.40 | |
| 10. FPÖ | 1968.0 | 1969.77 | 15.82 | |
| 11. HC | 1967.0 | 1968.61 | 13.00 | |
| data: https://www.wien.gv.at/politik/wahlen/grbv/2020/ analysis: Roland Schmidt | @zoowalk | https://werk.statt.codes | ||||
|
1
Vertical white line indicates median.
|
||||
I won’t dig into every details, so only a few general remarks: On the city council level (‘Gemeinderat’) the established parties (SPÖ, FPÖ, Greens, ÖVP, Neos) feature (on average) somewhat older candidates than other, newer parties. On the district level (‘Bezirksvertretungen’), the former are occupying more of a middle ground. Among them, Neos’ candidates are on average younger, FPÖ candidates older. Funnily, when ranking parties according their median age, FPÖ and Team HC Strache (=former leader of FPÖ) are always adjacent.
Let’s now take up a ‘geographical perspective’ and see how birth years are distributed within electoral districts and the difference between the latter.
df_pl <- df_clean %>%
group_by(election) %>%
mutate(year_median_election=median(year_birth, na.rm=T)) %>%
group_by(election, wahlkreis) %>%
mutate(year_median_wahlkreis=median(year_birth, na.rm=T)) %>%
mutate(year_mean_wahlkreis=mean(year_birth, na.rm=T)) %>%
mutate(year_sd_wahlkreis=sd(year_birth, na.rm=T)) %>%
ungroup() %>%
mutate(wahlkreis_fct=fct_reorder(wahlkreis, year_median_wahlkreis))
ls_pl <- df_pl %>%
group_split(election) %>%
imap(~ggplot(data=.x)+
labs(title=glue::glue("WIEN-WAHLEN 2020 <span style=color:firebrick>{unique(.x$election)}</span>:<br>Verteilung des Geburtsjahres von KandidatInnen pro Wahlbezirk"),
subtitle="Median bei vertikaler weißer Linie.",
x="Geburtsjahr",
caption=my_caption)+
ggridges::geom_density_ridges(aes(y=wahlkreis_fct,
x=year_birth),
fill="firebrick",
scale=2,
rel_min_height = 0.01,
quantile_lines=T,
quantiles=2,
color="white")+
geom_text(aes(y=length(unique(.x$wahlkreis)),
x=2010),
label="Median",
nudge_y=+1.5,
size = 3,
check_overlap = T,
hjust=0) +
geom_text(aes(y=length(unique(.x$wahlkreis)),
x=2020),
label="Mean",
check_overlap = T,
nudge_y=+1.5,
size = 3,
hjust=0) +
geom_text(aes(y=length(unique(.x$wahlkreis)),
x=2030),
label="SD",
nudge_y=+1.5,
size = 3,
check_overlap = T,
hjust=0) +
geom_text(aes(y=wahlkreis_fct,
x=2010,
label=glue::glue("{year_median_wahlkreis}")),
check_overlap = T,
size = 3,
color = "grey50",
hjust=0,
nudge_y = 0.5
)+
geom_text(aes(y=wahlkreis_fct,
x=2020,
label=glue::glue("{round(year_mean_wahlkreis, digits=1)}")),
check_overlap = T,
size = 3,
color = "grey50",
size=9,
hjust=0,
nudge_y = 0.5
)+
geom_text(aes(y=wahlkreis_fct,
x=2030,
label=glue::glue("{round(year_sd_wahlkreis, digits=2)}")),
check_overlap = T,
size = 3,
color = "grey50",
hjust=0,
nudge_y = 0.5
)+
scale_y_discrete(labels=function(x) {stringr::str_remove(x, regex("___.*")) %>%
str_trunc(., width=10, side="right", ellipsis = "...")},
expand=expansion(mult=c(0,.1)))+
scale_x_continuous(limits=c(1925, 2030),
breaks=seq(1950, 2000, 10),
expand=expansion(mult=c(0,.10)))+
theme_post()+
theme(legend.position="none",
axis.title.x=element_text(hjust=0.7),
axis.text.y=element_text(vjust=0)))
There is quite a difference in the median value between Ottakring (the youngest, 1979) and Donaustadt (the oldest, 1970.5). But then again the median value is just an aggregate value and conveys only a bit of the wider picture. I think the distribution curves are more telling. Whether that’s useful information or not, I am not entirely sure. At least it’s something, I have not seen before.
Now let’s dig deeper and see for the distribution of birth years by electoral districts and parties, again for each election (city council, district councils).
Hover with the mouse over the box plot and dots to obtain the pertaining data.
df_age_list_district_graph <- df_clean %>%
filter(!str_detect(election, "Stadt")) %>%
group_by(election, wahlkreis, party) %>%
mutate(year_median=median(year_birth, na.rm=T),
year_mean=mean(year_birth, na.rm=T),
year_se=sd(year_birth, na.rm=T),
values_nested=list(year_birth=year_birth)) %>%
group_by(election) %>%
mutate(median_election=median(year_birth, na.rm = T)) %>%
group_by(election, wahlkreis) %>%
mutate(median_wahlkreis=median(year_birth, na.rm=T)) %>%
ungroup() %>%
mutate(wahlkreis_num=str_extract(wahlkreis, regex("\\d+")) %>% as.numeric()) %>%
mutate(wahlkreis_fac=fct_reorder(wahlkreis, wahlkreis_num)) %>%
mutate(party=fct_rev(party))
ls_age_list_district_graph <- df_age_list_district_graph %>%
mutate(election=fct_relevel(election, "Gemeinderat", "Bezirksvertretungen")) %>% group_split(election) %>%
imap(~ggplot(data=.x)+
labs(title=glue::glue("WIEN-WAHL 2020: <span style=color:firebrick>{unique(.x$election)}</span><br>Verteilung des Geburtsjahres pro Wahlbezirk und Partei"),
#subtitle="Geburtsjahr als proxy für Alter.",
x="Geburtsjahr",
caption=my_caption)+
geom_jitter_interactive(aes(y=tidytext::reorder_within(
x=party,
by=year_median,
within=wahlkreis_fac
),
x=year_birth,
tooltip=glue::glue("{name}
{party}
{year_birth}"),
data_id=paste(name)),
color="grey50",
size=.5,
alpha=0.5)+
geom_boxplot_interactive(aes(
y=tidytext::reorder_within(
x=party,
by=year_median,
within=wahlkreis_fac
),
x=year_birth,
tooltip=glue::glue("{party}
Median: {year_median}
arith Mittelw.: {round(year_mean, 2)}
Std.Abw.: {round(year_se,2)}"),
data_id=paste(wahlkreis, party)),
outlier.shape = NA,
size=.2,
color="#374E55FF",
fill=NA)+
geom_vline(aes(xintercept=median_election,
color="color_vline",
group=wahlkreis,
name=""),
#key_glyph=c("vline"),
linetype="dotted"
)+
scale_color_manual(labels=c(color_vline="median of all candidates"),
values=c(color_vline="firebrick"))+
scale_x_continuous(labels=function(x) str_sub(x, start=3),
breaks=seq(1940, 2000, 20))+
tidytext::scale_y_reordered()+
facet_wrap(vars(wahlkreis_fac),
scales="free",
ncol=4,
labeller=as_labeller(function(x) str_trunc(x,
width=15,
side=c("right"),
ellipsis="...")
)
)+
theme_post()+
theme(
panel.grid.major.y = element_blank(),
axis.text.y = element_markdown(size=6),
axis.text.x = element_markdown(size=6),
axis.title.x = element_markdown(size=6),
legend.position = "top",
legend.justification = "left",
legend.title=element_blank()
))
Below essentially the same information as provided in the graph above but in a tabular form.
df_age_list_district <- df_clean %>%
#filter(!str_detect(election, "Stadt")) %>%
group_by(election, wahlkreis, party) %>%
summarize(year_median=median(year_birth, na.rm=T),
year_mean=mean(year_birth, na.rm=T),
year_se=sd(year_birth, na.rm=T),
values_nested=list(year_birth=year_birth)) %>%
ungroup() %>%
mutate(values_df=map(values_nested, as_tibble_col, column_name ="birth_year"))
tb_age_list_district <- df_age_list_district %>%
reactable(.,
columns=list(election=colDef(name="Wahl"),
wahlkreis=colDef(name="Wahlkreis"),
party=colDef(name="Partei",
width=70),
year_median=colDef(name="Median",
format=colFormat(digits=1)),
year_mean=colDef(name="arith. Mittel",
format=colFormat(digits=1)),
values_nested=colDef(show = F),
values_df=colDef(show = F),
year_se=colDef(name="Std.Abw.",
format=colFormat(digits=1))),
bordered=F,
compact = TRUE,
highlight = TRUE,
style = list(fontSize = "10px"),
filterable = TRUE,
theme = reactableTheme(
borderWidth = 1,
borderColor = "#7f7f7f",
backgroundColor = plot_bg_color,
filterInputStyle = list(
color="green",
backgroundColor = plot_bg_color))
)
html_age_list_district <- fn_reactable_headings(header ="WIEN-WAHL 2020: Durchschnittliches Alter pro Wahlbezirk", subtitle = "Geburtsjahr lt. Wahlvorschlag als Basis.",
table=tb_age_list_district,
caption=caption_table)
Another question which came to my mind: How is a candidate’s birth year/age related to his or her position on the electoral list. Are more senior candidates more likely to be on more attractive, i.e. top positions? Or is youth related to competitive positions? Are there any differences between the parties or elections (city vs district level) when it comes to this relation? Hover with the mouse over the individual dots to obtain data on indivdual candidates.
ls_pl_age_rank_district <- df_clean %>%
group_split(election) %>%
imap(~ggplot(data=.x)+
labs(
title=glue::glue("WIEN-WAHL 2020: <span style = color:firebrick>{unique(.x$election)}</span><br>Listenplatz und Geburtsjahr"),
subtitle="Loess regression.",
x="Geburtsjahr",
y="Listenplatz",
caption=my_caption)+
geom_jitter_interactive(aes(x=year_birth,
y=listenplatz,
tooltip=glue::glue("{name}
Geb.Jahr: {year_birth}
Listenplatz: {listenplatz}
Wahlkreis: {wahlkreis}
")),
color="grey50",
alpha=0.2)+
geom_smooth(aes(x=year_birth,
y=listenplatz),
method='loess',
formula= y~x,
size=0.5,
color="firebrick")+
scale_x_continuous(labels=function(x) str_sub(x, start=3),
breaks=seq(1940, 2000, 20))+
facet_wrap(vars(party))+
theme_post()+
theme(axis.title.x=element_text(size=6),
axis.title.y=element_text(
angle=90,
size=6,
color="grey50",
hjust=1),
axis.text.y = element_markdown(size=6),
axis.text.x = element_markdown(size=6)))
Interestingly, there are some subtle differences between the regression lines’ slope. Note how it increases for higher birth years for the SPÖ and the FPÖ. In other words, younger candidates tend to be on higher (less attractive) positions on the electoral list for these two parties. As for the other major parties, there is no comparable relation, at least not as visible as in the two previous cases. Obviously, this is purely descriptive.
To see the results for the other electoral levels unfold the sections below.
When extracting data from the pdf document with candidates’ details, I also retrieved the zip code of their places of residence. The idea was to see whether candidates actually live in the electoral districts where they are running for office. For the lack of a better term, I call these candidates ‘non-resident candidates.’
I could imagine that ‘non-residency’ conflates quite a number of factors, and the reasons and consequences of ‘non-residency’ are likely to be quite complex. A possible hypotheses could be that parties with high number of non-resident candidates are unevenly institutionalized across the city and hence have to bring in members from outside of the electoral district. But again, this is purely speculative and I haven’t thought, let alone read about it systematically. Nevertheless, as some kind of tentative inquiry, I thought it’s worth looking at the numbers. Three specific questions came to my mind: 1) Are there parties which feature particularly frequently such non-resident candidates? 2) Are there districts where non-resident candidates are particularly frequently running for office? 3) Do the shares of non-resident candidates differ between elections for the city council and for the district councils? I would think that the relation between a candidate and her electoral district, i.e. place of residence, is more of a significance in the latter case.
#party with most out-of-residence candidates
df_residence_party <- df_clean %>%
filter(!str_detect(election, "Stadt")) %>%
group_by(election, party, residence) %>%
summarize(n_residence=n()) %>%
mutate(rel_residence=n_residence/sum(n_residence, na.rm = T)) %>%
filter(residence=="outside") %>%
arrange(desc(rel_residence)) %>%
ungroup()
#fn for barchart
bar_chart <- function(label, width = "100%", height = "16px", fill = "#00bfc4", background = NULL) {
bar <- div(style = list(background = fill, width = width, height = height))
chart <- div(style = list(flexGrow = 1, marginRight = "8px", background = background), bar)
div(style = list(display = "flex", alignItems = "right"), chart, label)
}
#assemble table
tb_residence_party <- df_residence_party %>%
select(party, everything()) %>%
reactable(.,
columns = list(
party=colDef(name="Party"),
election=colDef(name="Election"),
residence=colDef(show=FALSE),
n_residence = colDef(name = "Number",
align = "left",
cell = function(value) {
width <- paste0(value / max(df_residence_party$n_residence) * 100, "%")
value <- format(value, trim=F, width=3, justify = "right")
bar_chart(value, width = width, fill = "#00bfc4")}),
rel_residence = colDef(name = "Share",
align = "left",
cell = function(value) {
width <- paste0(value * 100, "%")
value <- scales::percent(value)
bar_chart(value, width = width, fill = "orange", background = "#e1e1e1")})),
bordered=F,
compact = TRUE,
highlight = TRUE,
style = list(fontSize = "10px"),
filterable = TRUE,
theme = reactableTheme(
borderColor = "#7f7f7f",
borderWidth = 1,
backgroundColor = "#f0eff0",
filterInputStyle = list(
color="green",
backgroundColor = plot_bg_color)
))
tb_residence_party <- fn_reactable_headings(header = "Parties' number and share of non-resident candidates per election",
subtitle = "Non-resident candidates: Candidates not residing in their electoral district according to their residency zip code.",
table=tb_residence_party,
caption=caption_table)
#share in districts
df_residence_district <- df_clean %>%
filter(!str_detect(election, "Stadt")) %>%
group_by(election, wahlkreis, residence) %>%
summarize(n_residence=n()) %>%
mutate(rel_residence=n_residence/sum(n_residence, na.rm = T)) %>%
filter(residence=="outside") %>%
ungroup() %>%
arrange(election, desc(rel_residence))
#fn for barchart
bar_chart <- function(label, width = "100%", height = "16px", fill = "#00bfc4", background = NULL) {
bar <- div(style = list(background = fill, width = width, height = height))
chart <- div(style = list(flexGrow = 1, marginRight = "8px", background = background), bar)
div(style = list(display = "flex", alignItems = "right"), chart, label)
}
#assemble table
tb_residence_district <- df_residence_district %>%
reactable(.,
columns = list(
election=colDef(name="Election"),
wahlkreis=colDef(name="Electoral district"),
residence=colDef(show=FALSE),
n_residence = colDef(name = "Number",
align = "left",
cell = function(value) {
width <- paste0(value / max(df_residence_district$n_residence) * 100, "%")
value <- format(value, trim=F, width=3, justify = "right")
bar_chart(value, width = width, fill = "#00bfc4")}),
rel_residence = colDef(name = "Share",
align = "left",
cell = function(value) {
# width <- paste0(value / max(df_residence_district$rel_residence) * 100, "%")
width <- paste0(value * 100, "%")
value <- scales::percent(value)
bar_chart(value, width = width, fill = "orange", background = "#e1e1e1")})),
bordered=F,
compact = TRUE,
highlight = TRUE,
style = list(fontSize = "10px"),
filterable = TRUE,
theme = reactableTheme(
borderColor = "#7f7f7f",
borderWidth = 1,
backgroundColor = "#f0eff0",
filterInputStyle = list(
color="green",
backgroundColor = plot_bg_color)
))
tb_residence_district <- fn_reactable_headings(header = "Share of non-resident candidates per electoral district", subtitle="",
table=tb_residence_district,
caption=caption_table)
Finally, let’s contrast parties’ non-resident candidate ratios for the election to the city council and the district councils.
diff_share_non_residency <- df_clean %>%
filter(!str_detect(election, "Stadt")) %>%
group_by(election, party, residence) %>%
summarize(n_residence=n()) %>%
mutate(rel_residence=n_residence/sum(n_residence, na.rm = T)) %>%
filter(residence=="outside") %>%
ungroup() %>%
group_by(party) %>%
mutate(n_elections=n()) %>%
filter(n_elections==2)
diff_share_non_residency_wide <- diff_share_non_residency %>%
pivot_wider(id_cols=party, names_from=election,
values_from=rel_residence) %>%
mutate(city_larger_district=case_when(Gemeinderat > Bezirksvertretungen ~ "GR größer",
Gemeinderat < Bezirksvertretungen ~ "GR kleiner"))
pl_diff_share_non_residency<- diff_share_non_residency %>%
drop_na() %>%
ggplot()+
labs(title=glue::glue("WIEN-WAHL 2020: Share of non-resident candidates per party.<br><span style=color:{paletteer_d('ggsci::default_jama')[2]}>City council (Gemeinderat)</span> vs <span style=color:{paletteer_d('ggsci::default_jama')[3]}>district councils (Bezirksräte)</span>"),
subtitle="Non-resident candidates: Candidates with different residence zip-code than their electoral district",
x="Share of non-resident candiates",
caption=my_caption)+
geom_segment(data=diff_share_non_residency_wide,
aes(y=reorder(party, -Gemeinderat),
yend=reorder(party, -Gemeinderat),
x=Gemeinderat,
xend=Bezirksvertretungen),
color="grey50")+
geom_point(aes(y=reorder(party, -rel_residence),
x=rel_residence,
shape=election,
color=election),
size=2)+
scale_color_manual(values=c("Gemeinderat"=paletteer_d("ggsci::default_jama")[c(2)],
"Bezirksvertretungen"=paletteer_d("ggsci::default_jama")[c(3)]),
label=c("Gemeinderat"="City council",
"Bezirksvertretungen"="District councils"))+
scale_shape(guide="none")+
scale_x_percent()+
theme_post()+
theme(legend.position = "none",
legend.direction = "horizontal",
legend.title=element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.major.x = element_line())
The graph shows that almost all parties feature a smaller share of non-resident candidates in the elections to the city council than in the elections to the district councils. This difference is particularly strong for the FPÖ and for Team HC Strache. However, there are two notable exceptions. On is WIFF - Wir für Floridsdorf, a party with a clear programmatic focus for one specific district. Hence, it doesn’t come as a surprise that the share of non-resident candidates for the district council is low. The second exceptions are the Greens. I am second guessing here, but I could imagine that this is indicative for a strong ‘grass-root’, locally based party organization and reflecting the engagement of candidates who actually live in their electoral districts. I find the graph in any case quite noteworthy.
So that’s it for now. Again, this has been largely about retrieving and crunching numbers, but I guess some bits are worth entertaining more substantively..
Text and figures are licensed under Creative Commons Attribution CC BY-NC-SA 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".